\subsection{Query Candidate Ranking}
\label{sec:ranking}

After the above two steps, SQL queries satisfying the given input-output examples
will be returned. To reduce the effort in inspecting the
results and selecting a desirable query, we devise a strategy
to put the most likely query near the top of the return list.
%Due to lacking in input-output examples we are not be able to rule out SQL
%queries that lack in generalization ability. Instead we will provide user queries according to a rank.

Our strategy is based on Occam's razor to
compute a cost for each returned query, and prefers
queries with lower costs.
%queries in the increasing order of their costs.
For a SQL query, each table appearing in it introduces a cost $C_t$ and each appearing condition 
introduces an additional cost $C_p$. The total cost of a query is
computed by: $n_t \cdot C_t+n_p \cdot C_p$, where
$n_t$ is the number of tables and $n_p$ is the number of predicates used in the query. Our technique 
ranks queries based on their costs in an increasing order to ensure that a simpler query often ranks higher.
Figure~\ref{fig:rank} shows an example to illustrate this ranking strategy.


\begin{figure}[t]

\centering
\begin{tabular}{|l|l|}
%\hline
\multicolumn{2}{l}{Input table: \textbf{student}}\\\hline
%\hline
\textbf{name} & \textbf{score} \\\hline
Bob & 4  \\\hline
Dan & 5  \\\hline
Jim & 2  \\\hline
\end{tabular}
\quad
\begin{tabular}{|l|}
%\hline
\multicolumn{1}{l}{Query output}\\\hline
%\hline
\textbf{name}\\\hline
Bob  \\\hline
Dan  \\\hline
\end{tabular}

\vspace{2mm}


Query 1:
\vspace{-4mm}
\begin{CodeOut}
\begin{alltt}
\centering
\textbf{select name from student where score > 3;}
\end{alltt}
\end{CodeOut}

Query 2:
\vspace{-4mm}
\begin{CodeOut}
\begin{alltt}
\centering
\textbf{select name from student where name = `Bob` or name = `Dan`
}
\end{alltt}
\end{CodeOut}
\vspace{-5mm}
\Caption{{\label{fig:rank} The illustration of our query candidate ranking
strategy. Both Query 1 and Query 2 can transform the example input
to the example output. However, based on our ranking strategy, Query 1
is ranked higher, since it contains less conditions and is simpler
than Query 2.
An inferred query containing more conditions like Query 2 is more likely
to overfit the given examples.}}
\end{figure}
